{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Feature Transformation with Amazon a SageMaker Processing Job and Scikit-Learn\n",
    "\n",
    "In this notebook, we convert raw text into BERT embeddings.  This will allow us to perform natural language processing tasks such as text classification.\n",
    "\n",
    "Typically a machine learning (ML) process consists of few steps. First, gathering data with various ETL jobs, then pre-processing the data, featurizing the dataset by incorporating standard techniques or prior knowledge, and finally training an ML model using an algorithm.\n",
    "\n",
    "Often, distributed data processing frameworks such as Scikit-Learn are used to pre-process data sets in order to prepare them for training. In this notebook we'll use Amazon SageMaker Processing, and leverage the power of Scikit-Learn in a managed SageMaker environment to run our processing workload."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NOTE:  THIS NOTEBOOK WILL TAKE A 5-10 MINUTES TO COMPLETE.\n",
    "\n",
    "# PLEASE BE PATIENT."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](img/prepare_dataset_bert.png)\n",
    "\n",
    "![](img/processing.jpg)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "1. Setup Environment\n",
    "1. Setup Input Data\n",
    "1. Setup Output Data\n",
    "1. Build a Spark container for running the processing job\n",
    "1. Run the Processing Job using Amazon SageMaker\n",
    "1. Inspect the Processed Output Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup Environment\n",
    "\n",
    "Let's start by specifying:\n",
    "* The S3 bucket and prefixes that you use for training and model data. Use the default bucket specified by the Amazon SageMaker session.\n",
    "* The IAM role ARN used to give processing and training access to the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "\n",
    "sess = sagemaker.Session()\n",
    "role = sagemaker.get_execution_role()\n",
    "bucket = sess.default_bucket()\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "sm = boto3.Session().client(service_name='sagemaker', region_name=region)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup Input Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%store -r s3_public_path_tsv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    s3_public_path_tsv\n",
    "except NameError:\n",
    "    print('++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n",
    "    print('[ERROR] Please run the notebooks in the INGEST section before you continue.')\n",
    "    print('++++++++++++++++++++++++++++++++++++++++++++++++++++++++')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(s3_public_path_tsv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%store -r s3_private_path_tsv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    s3_private_path_tsv\n",
    "except NameError:\n",
    "    print('++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n",
    "    print('[ERROR] Please run the notebooks in the INGEST section before you continue.')\n",
    "    print('++++++++++++++++++++++++++++++++++++++++++++++++++++++++')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(s3_private_path_tsv)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's Copy 1 More Large Data File to Use For Training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!aws s3 cp --recursive $s3_public_path_tsv/ $s3_private_path_tsv/ --exclude \"*\" --include \"amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "raw_input_data_s3_uri = 's3://{}/amazon-reviews-pds/tsv/'.format(bucket)\n",
    "print(raw_input_data_s3_uri)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 ls $raw_input_data_s3_uri"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run the Processing Job using Amazon SageMaker\n",
    "\n",
    "Next, use the Amazon SageMaker Python SDK to submit a processing job using our custom python script."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Review the Processing Script"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!pygmentize preprocess-scikit-text-to-bert.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run this script as a processing job.  You also need to specify one `ProcessingInput` with the `source` argument of the Amazon S3 bucket and `destination` is where the script reads this data from `/opt/ml/processing/input` (inside the Docker container.)  All local paths inside the processing container must begin with `/opt/ml/processing/`.\n",
    "\n",
    "Also give the `run()` method a `ProcessingOutput`, where the `source` is the path the script writes output data to.  For outputs, the `destination` defaults to an S3 bucket that the Amazon SageMaker Python SDK creates for you, following the format `s3://sagemaker-<region>-<account_id>/<processing_job_name>/output/<output_name>/`.  You also give the `ProcessingOutput` value for `output_name`, to make it easier to retrieve these output artifacts after the job is run.\n",
    "\n",
    "The arguments parameter in the `run()` method are command-line arguments in our `preprocess-scikit-text-to-bert.py` script.\n",
    "\n",
    "Note that we sharding the data using `ShardedByS3Key` to spread the transformations across all worker nodes in the cluster."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Track the `Experiment`\n",
    "We will track every step of this experiment throughout the `prepare`, `train`, `optimize`, and `deploy`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Concepts\n",
    "\n",
    "**Experiment**: A collection of related Trials.  Add Trials to an Experiment that you wish to compare together.\n",
    "\n",
    "**Trial**: A description of a multi-step machine learning workflow. Each step in the workflow is described by a Trial Component. There is no relationship between Trial Components such as ordering.\n",
    "\n",
    "**Trial Component**: A description of a single step in a machine learning workflow. For example data cleaning, feature extraction, model training, model evaluation, etc.\n",
    "\n",
    "**Tracker**: A logger of information about a single TrialComponent.\n",
    "\n",
    "<img src=\"img/sagemaker-experiments.png\" width=\"90%\" align=\"left\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create the `Experiment`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from smexperiments.experiment import Experiment\n",
    "\n",
    "timestamp = int(time.time())\n",
    "\n",
    "experiment = Experiment.create(\n",
    "                experiment_name='Amazon-Customer-Reviews-BERT-Experiment-{}'.format(timestamp),\n",
    "                description='Amazon Customer Reviews BERT Experiment', \n",
    "                sagemaker_boto_client=sm)\n",
    "\n",
    "experiment_name = experiment.experiment_name\n",
    "print('Experiment name: {}'.format(experiment_name))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create the `Trial`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from smexperiments.trial import Trial\n",
    "\n",
    "timestamp = int(time.time())\n",
    "\n",
    "trial = Trial.create(trial_name='trial-{}'.format(timestamp),\n",
    "                     experiment_name=experiment_name,\n",
    "                     sagemaker_boto_client=sm)\n",
    "\n",
    "trial_name = trial.trial_name\n",
    "print('Trial name: {}'.format(trial_name))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create the `Experiment Config`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment_config = {\n",
    "    'ExperimentName': experiment_name,\n",
    "    'TrialName': trial.trial_name,\n",
    "    'TrialComponentDisplayName': 'prepare'\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(experiment_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store experiment_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(trial_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store trial_name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set the Processing Job Hyper-Parameters "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "processing_instance_type='ml.c5.2xlarge'\n",
    "processing_instance_count=1\n",
    "train_split_percentage=0.90\n",
    "validation_split_percentage=0.05\n",
    "test_split_percentage=0.05\n",
    "balance_dataset=True\n",
    "max_seq_length=64"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Choosing a `max_seq_length` for BERT\n",
    "Since a smaller `max_seq_length` leads to faster training and lower resource utilization, we want to find the smallest review length that captures `70%` of our reviews.\n",
    "\n",
    "Remember our distribution of review lengths from a previous section?\n",
    "\n",
    "```\n",
    "mean         67.930174\n",
    "std         130.954079\n",
    "min           1.000000\n",
    "10%           4.000000\n",
    "20%          14.000000\n",
    "30%          21.000000\n",
    "40%          25.000000\n",
    "50%          31.000000\n",
    "60%          42.000000\n",
    "70%          59.000000\n",
    "80%          87.000000\n",
    "90%         149.000000\n",
    "100%       5347.000000\n",
    "max        5347.000000\n",
    "```\n",
    "\n",
    "![](img/review_word_count_distribution.png)\n",
    "\n",
    "Review length `59` represents the `70th` percentile for this dataset.  However, it's best to stick with powers-of-2 when using BERT.  So let's choose `64` as this is the smallest power-of-2 greater than `59`.  Reviews with length > `64` will be truncated to `64`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from sagemaker.sklearn.processing import SKLearnProcessor\n",
    "\n",
    "processor = SKLearnProcessor(framework_version='0.23-1',\n",
    "                             role=role,\n",
    "                             instance_type=processing_instance_type,\n",
    "                             instance_count=processing_instance_count,\n",
    "                             max_runtime_in_seconds=7200)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.processing import ProcessingInput, ProcessingOutput\n",
    "\n",
    "processor.run(code='preprocess-scikit-text-to-bert.py',\n",
    "              inputs=[\n",
    "                    ProcessingInput(source=raw_input_data_s3_uri,\n",
    "                                    destination='/opt/ml/processing/input/data/',\n",
    "                                    s3_data_distribution_type='ShardedByS3Key')\n",
    "              ],\n",
    "              outputs=[\n",
    "                    ProcessingOutput(s3_upload_mode='EndOfJob',\n",
    "                                     output_name='bert-train',\n",
    "                                     source='/opt/ml/processing/output/bert/train'),\n",
    "                    ProcessingOutput(s3_upload_mode='EndOfJob',\n",
    "                                     output_name='bert-validation',\n",
    "                                     source='/opt/ml/processing/output/bert/validation'),\n",
    "                    ProcessingOutput(s3_upload_mode='EndOfJob',\n",
    "                                     output_name='bert-test',\n",
    "                                     source='/opt/ml/processing/output/bert/test'),\n",
    "              ],\n",
    "              arguments=['--train-split-percentage', str(train_split_percentage),\n",
    "                         '--validation-split-percentage', str(validation_split_percentage),\n",
    "                         '--test-split-percentage', str(test_split_percentage),\n",
    "                         '--max-seq-length', str(max_seq_length),\n",
    "                         '--balance-dataset', str(balance_dataset)\n",
    "              ],\n",
    "              experiment_config=experiment_config,\n",
    "              logs=True,\n",
    "              wait=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "scikit_processing_job_name = processor.jobs[-1].describe()['ProcessingJobName']\n",
    "print(scikit_processing_job_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/processing-jobs/{}\">Processing Job</a></b>'.format(region, scikit_processing_job_name)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://console.aws.amazon.com/cloudwatch/home?region={}#logStream:group=/aws/sagemaker/ProcessingJobs;prefix={};streamFilter=typeLogStreamPrefix\">CloudWatch Logs</a> After About 5 Minutes</b>'.format(region, scikit_processing_job_name)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a target=\"blank\" href=\"https://s3.console.aws.amazon.com/s3/buckets/{}/{}/?region={}&tab=overview\">S3 Output Data</a> After The Processing Job Has Completed</b>'.format(bucket, scikit_processing_job_name, region)))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Monitor the Processing Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "running_processor = sagemaker.processing.ProcessingJob.from_processing_name(processing_job_name=scikit_processing_job_name,\n",
    "                                                                            sagemaker_session=sess)\n",
    "\n",
    "processing_job_description = running_processor.describe()\n",
    "\n",
    "print(processing_job_description)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "processing_job_name = processing_job_description['ProcessingJobName']\n",
    "print(processing_job_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "running_processor.wait(logs=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# _Please Wait Until the ^^ Processing Job ^^ Completes Above._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inspect the Processed Output Data\n",
    "\n",
    "Take a look at a few rows of the transformed dataset to make sure the processing was successful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "processing_job_description = running_processor.describe()\n",
    "\n",
    "output_config = processing_job_description['ProcessingOutputConfig']\n",
    "for output in output_config['Outputs']:\n",
    "    if output['OutputName'] == 'bert-train':\n",
    "        processed_train_data_s3_uri = output['S3Output']['S3Uri']\n",
    "    if output['OutputName'] == 'bert-validation':\n",
    "        processed_validation_data_s3_uri = output['S3Output']['S3Uri']\n",
    "    if output['OutputName'] == 'bert-test':\n",
    "        processed_test_data_s3_uri = output['S3Output']['S3Uri']\n",
    "        \n",
    "print(processed_train_data_s3_uri)\n",
    "print(processed_validation_data_s3_uri)\n",
    "print(processed_test_data_s3_uri)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!aws s3 ls $processed_train_data_s3_uri/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!aws s3 ls $processed_validation_data_s3_uri/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 ls $processed_test_data_s3_uri/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pass Variables to the Next Notebook(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store processing_job_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%store processed_train_data_s3_uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%store processed_validation_data_s3_uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%store processed_test_data_s3_uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%store max_seq_length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store experiment_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store trial_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Show the Experiment Tracking Lineage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.analytics import ExperimentAnalytics\n",
    "\n",
    "import pandas as pd\n",
    "pd.set_option(\"max_colwidth\", 500)\n",
    "#pd.set_option(\"max_rows\", 100)\n",
    "\n",
    "experiment_analytics = ExperimentAnalytics(\n",
    "    sagemaker_session=sess,\n",
    "    experiment_name=experiment_name,\n",
    "    sort_by=\"CreationTime\",\n",
    "    sort_order=\"Descending\"\n",
    ")\n",
    "\n",
    "experiment_analytics_df = experiment_analytics.dataframe()\n",
    "experiment_analytics_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trial_component_name=experiment_analytics_df.TrialComponentName[0]\n",
    "print(trial_component_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trial_component_description=sm.describe_trial_component(TrialComponentName=trial_component_name)\n",
    "trial_component_description"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.lineage.visualizer import LineageTableVisualizer\n",
    "\n",
    "lineage_table_viz = LineageTableVisualizer(sess)\n",
    "lineage_table_viz_df = lineage_table_viz.show(processing_job_name=processing_job_name)\n",
    "lineage_table_viz_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from sagemaker.lineage.visualizer import LineageTableVisualizer\n",
    "\n",
    "# lineage_table_viz = LineageTableVisualizer(sess)\n",
    "# lineage_table_viz_df = lineage_table_viz.show(trial_component_name=trial_component_name)\n",
    "# lineage_table_viz_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.analytics import ArtifactAnalytics\n",
    "\n",
    "artifact_analytics = ArtifactAnalytics()\n",
    "\n",
    "artifacts_df = artifact_analytics.dataframe()\n",
    "artifacts_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# from sagemaker.lineage.association import Association\n",
    "# from sagemaker.lineage.artifact import Artifact\n",
    "\n",
    "# incoming_associations = Association.list(destination_arn='arn:aws:sagemaker:us-east-1:835319576252:artifact/ebfc31f9f6007feb1a70daaf24339c08',\n",
    "#                                 sort_by='CreationTime', \n",
    "#                                 sort_order='Descending')\n",
    "\n",
    "# for association in enumerate(incoming_associations):\n",
    "#     print(association)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from sagemaker.lineage.association import Association\n",
    "# from sagemaker.lineage.artifact import Artifact\n",
    "\n",
    "# incoming_associations = Association.list(source_arn='arn:aws:sagemaker:us-east-1:835319576252:artifact/ebfc31f9f6007feb1a70daaf24339c08',\n",
    "#                                 sort_by='CreationTime', \n",
    "#                                 sort_order='Descending')\n",
    "\n",
    "# for association in enumerate(incoming_associations):\n",
    "#     print(association)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # list all the contexts\n",
    "# from sagemaker.lineage.context import Context\n",
    "\n",
    "# contexts = Context.list(sort_by='CreationTime', \n",
    "#                         sort_order='Descending')\n",
    "\n",
    "# for ctx in contexts:\n",
    "#     print(ctx.source.source_uri)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from sagemaker.lineage.action import Action\n",
    "\n",
    "# actions = Action.list(sort_by='CreationTime', sort_order='Descending')\n",
    "\n",
    "# for action in actions:\n",
    "#     print(action)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%javascript\n",
    "Jupyter.notebook.save_checkpoint();\n",
    "Jupyter.notebook.session.delete();"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
